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Abstract — This paper deals with the problem of identifying 
elephants in the Internet Traffic. The aim is to analyze a new 
adaptive algorithm based on a Bloom Filter. This algorithm uses 
a so-called min-rule which can be described as in the supermarket 
model. This model consists of joining the shortest queue among 
d queues selected at random in a large number of m queues. In 
case of equality, one of the shortest queues is chosen at random. 
An analysis of a simplified model gives an insight of the error 
generated by the algorithm on the estimation of the number of 
the elephants. The main conclusion is that, as rn gets large, there 
is a deterministic limit for the empirical distribution of the filter 
counters. Limit theorems are proved and the limit is identified. 
It depends on key parameters. The condition for the algorithm 
to perform well is discussed. Theoretical results are validated 
by experiments on a traffic trace from France Telecom and by 
simulations. 



I. Introduction 

To be efficient, network traffic measurement methods have 
to be adapted to the actual traffic characteristics. Internet links 
are currently carrying a huge amount of data at a very high 
bit rate (40 Gb/s in OC-768). To analyze on-line this traffic, 
scalable algorithms are required. They have to operate fast, 
using a limited small memory. The traffic is mainly analyzed 
at the flow level. A flow is a sequence of packets defined by 
the classical 5tuple composed of the source and destination 
addresses, the source and destination port numbers together 
with the protocol type. Flows statistics are very useful for 
traffic engineering and network management. In particular, 
information about large flows (also called elephants) is very 
interesting for many applications. Note that an elephant is a 
flow with at least K packets, where K is in practice equal 
to 20. Elephants are not numerous (around 5 to 20% of the 
number of flows), but they represent the main part (80-90 %) 
of the traffic volume in terms of packets. Elephants statistics 
can be exploited in various fields such as attacks detection 
or accounting. In the literature, some probabilistic algorithms 
have been developed to estimate on-line the number of ele- 
phants in a dense traffic. In [1], Flajolet analyzed the Adaptive 
Sampling algorithm proposed by Wegman. This algorithm is 
based on a special sampling method that provides a random 
set of the original flows. Some characteristics on elephants 
(number, size distribution) can be inferred from this sample. 
Some other algorithms (see [2], [3]) based on sampling are 
designed to provide elephants statistics, but they seem to be 
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designed to very specific flow size distribution or require an 
a priori knowledge of the total number of flows to recover 
the loss of information caused by the sampling. Moreover all 
these algorithms are not able to identify elephants, that is to 
give their addresses. Such information is particularly useful 
for attacks detection. 

For that, Estan and Varghese [4] propose an algorithm 
based on Bloom filters. This algorithm is quick enough and 
it uses a limited memory, but it is not adapted to traffic 
variations. It uses a fixed parameter which should be adjusted 
according to traffic intensity. Azzana in [5] then Chabchoub et 
al. [6] propose an improvement for this algorithm by adding 
a refreshment mechanism that depends on traffic variations. 
The principle of this latter algorithm is the following. The 
filter is composed of d stages. Each stage contains m counters 
and is associated to a hashing function. When a packet is 
received, its IP header is hashed by the d independent hashing 
functions and the corresponding counter in each stage is 
incremented by one. When a counter reaches K (the smallest 
elephants size), the corresponding flow is considered as an 
elephant. Due to the heavy Internet traffic, the filter needs 
to be sometimes refreshed, otherwise all the counters will 
exceed the threshold K, and then all the flows will be seen 
as elephants. The idea is to decrease all counters by one 
every time the proportion of non null counters reaches a given 
threshold r. In this way the refreshment frequency of the 
filter depends closely on the actual traffic intensity. Notice 
that the algorithm uses an improvement, the min-rule, also 
called conservative update in [4]. It consists in incrementing 
only the counters among d having the minimum value, for an 
arriving packet. Indeed, because of collisions, the flow size is 
at most given by the smallest associated counters. So the min- 
rule reduces the overestimation of flow size. This algorithm 
has been first presented in [5]. In [6], a more complete version 
of the algorithm is developed. A new refreshment mechanism 
based on the average of counters values is added. In addition, 
the algorithm (under some modifications) is applied to attacks 
detection. These algorithms are validated using several traffic 
traces. 

Chabchoub et al. present in [7] a theoretical analysis of 
the algorithm proposed by Azzana and described above. Their 
objective is to estimate the error generated by the algorithm 
on the estimation of number of elephants. The analytical study 
does not take into account the min-rule. 

In this paper, we focus on the analysis of the min-rule. For 
this purpose, the algorithm described above has been slightly 
modified. We consider now just one filter and two hashing 
functions (d = 2). An arriving packet increments the smallest 
counter among the two associated counters. In case of equality, 
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only one counter is incremented at random. In this way, every 
packet increments exactly one counter. A flow is declared as 
an elephant when its smallest associated counter reaches C = 
K/d. The same refreshment mechanism is maintained with a 
threshold r of about 50%. The basic idea is that when the 
filter is not overloaded, in general, for each arriving packet of 
a given flow, one of the two counters will be incremented in 
an alternative way. It means that the two counters will have 
almost the same values and when the smallest one reaches 
C, the corresponding flow has a total size of about K = dC 
packets (C packets hitting each counter). 

The advantage of this new algorithm is that each arriving 
packet increments exactly one counter. In this case the way to 
increment the counters is exactly, in a system of m queues, 
the way a customer joins the shortest queue among d queues 
chosen at random, ties being solved at random. This so- 
called supermarket model by Luczak and McDiarmid [8], 
[9], also known as load-balancing model with the choice, 
has been extensively studied in the literature because of its 
numerous useful applications. In computer science, the central 
result is stated in a pioneer paper by Azar et al. [10] then 
Miztenmacher [11], for a discrete time model when n balls 
are thrown into n urns with the choice. It is proved that, 
with probability tending to 1 as n gets large, the maximum 
load of an urn is log n/ log log n + 0(1) when d = 1 and 
loglogn/logci+ 0(1) if d > 2. Luczak and McDiarmid, in 
continuous time related models with the choice, explore the 
concentration of the maximum queue length (see [8], [12]). 
But the model had also already been studied in Vvedenskaya 
et al. [13], Graham [14] and others for mean-field limit 
theorems. In [13], a functional law of large numbers is stated: 
The vector of the tail proportions of queues converges in 
distribution as m tends to infinity to the unique solution of 
a differential system. The differential equation has a unique 
fixed point u p (k) = p^ d for a throughput p. It means 

that, when d > 2, the tail probabilities of the queue length 
decrease drastically. In Vvedenskaya and Suhov [15], variants 
of the choice policy and general service time distributions are 
investigated. Graham in [14] proves the convergence of the 
invariant measures to some Dirac measure. In other words, 
when m is large, the stationary vector of the proportions of 
queues with k customers is essentially deterministic and given 
by this limit. 

The aim of this paper is to model the min-rule via the 
supermarket model and to evaluate the performance of the 
new proposed algorithm. In particular, we want to calculate 
the error generated by the algorithm on the estimation of 
number of elephants. Notice that this error is due to both 
false negatives (missed elephants) and false positives (mice 
considered as elephants). Let us focus on false positives. To 
be declared as an elephant, a mouse must be hashed to one 
among counters greater than C after this operation. So the 
proportion of such counters is a good parameter to investigate 
in order to evaluate false positives. 

The most part of the paper is the analysis of a simple 
model where the flows are mice of size one. It is relevant 
because most of the flows are mice so collisions between 
flows are mainly due to collisions between mice. It turns out 



that the probability that a mouse is detected as an elephant is 
bounded by the probability that a given counter is greater than 
C just before a refreshment time. Thus the problem reduces 
to analyze the behavior of the model at the refreshment times. 
Moreover the transition phase is very short thus the study of 
the stationary behavior is pertinent. 

The key idea of the study is to use the Markovian framework 
in order to rigorously establish limit theorems and analytical 
expressions in the stationary regime. The main result is that, as 
m tends to infinity, the evolution of the model is characterized 
by a dynamical system which has at least one fixed point. 
When d = 1, this fixed point is unique and denoted by u>. 
In this paper, we conjecture its uniqueness when d > 2. 
The interpretation of w as a key quantity in a supermarket 
model with deterministic service times is discussed. Analytical 
expressions are given in [7] for d = 1, are more complicated 
to obtain here. 

An objective would be to prove the convergence of the 
invariant measure of the Markov chain as m tends to +oo 
to the Dirac measure 5^ at the fixed point w. In practice, such 
a result is completely crucial. If it is not true, if the sequence 
of invariant measures do not converge, the system oscillates 
with long periods of transition between different configurations 
(metastability phenomenon). So even if the algorithm performs 
well during a while, it can reach another state where it can 
give bad results. This question is partially addressed here. But 
the convergence of the invariant measures is conjectured, due 
to simulations of the algorithm where such a phenomenon has 
not been observed. For such a result, a possible technique is 
the existence of a Lyapounov function which both proves the 
convergence of the dynamical system to its unique fixed point 
w and the convergence of the sequence of invariant measures 
to 8yj. Such a function is exhibited in [7] for d = 1 and C = 2. 

A simulation of the limit distribution w is done for a uniform 
general mice size distribution. Experiments have two goals. 
First to compare the original version presented in [6] and the 
version of the algorithm introduced here. Second, the time 
between two refreshments is plotted. This quantity is crucial 
for the trade-off between false positives and false negatives. 
The time to reach the stationary phase is discussed. 

The organization of the paper is as follows: Section II 
presents the analytical results for the simple model defined 
to study the question of false positives. Section III is devoted 
to experiments. Section IV gives a discussion of the way to 
choose the parameter r in order to have an algorithm which 
performs well. 

II. The Markovian urn and ball model 

A. Description of the model 

In this section, the question of false positives is addressed: 
the probability for a mouse to be detected by the algorithm as 
an elephant. 

The problem is studied in a simple framework, where flows 
are reduced to mice of size one. Thus the model can be 
described as a urn and ball model because one size flows 
hashed in a filter with m counters can be viewed as balls 
thrown into m urns with capacity C under the supermarket 
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rule: For each ball, a subset of d ums is chosen at random 
and the ball is put in the least loaded urn, ties being resolved 
uniformly. Balls overflowing the capacity C are rejected. If, 
after putting the ball, the number of non empty urns exceeds 
rm, then one ball is removed from every non empty urn. 

The probability of a flow to be detected as an elephant is 
reduced to the probability that, after the ball arrival, all the 
d chosen urns have C balls. It is bounded by the probability 
that, just before a refreshment time, after putting the last ball 
in its urn, all the d urns chosen for that contain C balls. The 
bound is more convenient to study. The embedded model just 
before the refreshment times is studied. 

B. A Markovian framework 

For fixed C, let us consider the sequence (W^ l ) n( z^, where 
W™ denotes the vector of the proportions of urns with 

0. . . . , C balls just before the nth refreshment time. For m > 

1, {W™) n £fi is an ergodic Markov chain on the finite state 
space 



N 



c+i c 



i=i 



(where \rm\ denotes the smallest integer larger than rm). 
Thus it has a unique invariant measure ir m . 

The problem is that this quantity is combinatorically in- 
tractable. Even the transition probability P m of the Markov 
chain is awfully difficult to write. Nevertheless, one could 
expect an asymptotic of this quantity when m is large. In 
other words, the limit of the invariant measures 7r m when m 
is large is investigated. 

C. A dynamical system 

The way which is used here to obtain limit theorems is very 
classical (see [16] for example). In fact, the similar results for 
d = 1 can be found in Chabchoub et al [7]. The following 
results extend the case d = 1 to d > 1. Of course the 
motivation here is the case d > 2. The proofs must often 
be rewritten with new arguments and the sections which are 
still valid will be in general omitted. 

The following result is that, as m — > +oo, the Markov chain 
converges in distribution to a deterministic dynamical system 
which will be explicited. 

Let 

i? + \£>=i} 



t> de f r 
V = {w G 



i=0 

c 



c 

and V {r) d = {w G R^ +1 , V w t = 1 and V w, = r} 



be the state spaces. Let the shift s be defined as 



w 



(ioo + wi,w 2 , ■ • • , wc,0) on V 



and 



A : pW 



, w I 



du 



For the vector of proportions v G V, it is more convenient 
to deal with the vector of the tail proportions u defined by 
itfc = ^2 i>k Vi. G is then defined on by 

G(w) = v(X(w)) (1) 

where 

(v(t)) is associated to (u(t)) the unique solution of 



l k-i 



4, 



k G {!,... C}, uo = l 



with initial condition u(0) corresponding to v(0) = s(w) 

Proposition 1: If W™ converges in distribution to w G V, 
then (W7")neN converges in distribution to the dynamical 
system (w n ) ne m given by the recursion 

w n+ i = G(w n ), n G N. 
Notice that G maps, by definition of A, 7 ,(r) to V^ r \ 
Proof: The result is a consequence of the convergence of 
the transition P m of the Markov chain (W^ n )„gN as m tends 
to +oo to P given by 



P(w,.) 



It means that, starting from w just before a refreshment time, at 
the next refreshment time, the vector of the proportions of urns 
tend to G(w) when m tends to +oo. The uniform convergence 
stated by the following lemma provides the convenient way to 
prove Proposition [T] 
Lemma 1: For e > 0, 

sup P m (w,{w' eV$ :\\w' -G(w)\\>s}) - 0. 

Proof: The idea of the proof is that, starting from w 
(with \rm~\ non empty urns), after refreshment, the vector of 
the proportions is s(w) defined by 

s(w) = (w + Wl, w 2 , ■ ■ ■ , wc, 0) 

where the proportion of non empty urns is r — w\. Then a 
number t{" of balls are thrown in order to reach a state w' 
with again \rm~\ non empty urns. It has to be proved that w' 
is close to G(w). There are three steps: 

1) It can be proved that this number r™ is deterministic at 
first order, equivalent to X(w)m, where 

\{w) = 



l-t d 



(2) 



0. 



(3) 



when m is large. More precisely, 

sup Pu, ( — X(w) > e 

To see it, starting from w, t™ has an analytical expression 
as a sum of different numbers Yi of balls necessary to hit the 
(I + l)th non empty urn. Indeed, 

\rm\-l 
1= \rm~\ —Wxtn 

where the Y;s for I G N are independent random variables with 
geometrical distributions on N* with respective parameters 

l-l 



ai 



n 

3=0 
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m — j 



4 



i.e. F(Yi =n) = (l/m) 71 - 1 ^ - l/m), n>l. 

As E(Yj) = 1/(1 — a;), computing the mean and comparing 
this sum with integrals leads to 



sup 



At the same time, as Var(Y5) 
Var u; (r 1 m ) 



A( W ). 



a,/(l-a ; ) 2 , 



sup 



wev, 



(r) 



(It 



- W1 (l-f) s 



(5) 



(6) 



By Bienayme-Chebychev's inequality, using equations (|5]l and 
(|6), it proves ||3). 

2) From the previous fact, there is a natural coupling 
throwing t™to balls or X(w)m balls where the vectors of 
proportions W™ and say W™ are close to each other. Then 
there is also a coupling throwing X(w)m and a Poisson random 
variable with parameter X(w)m, for which, by Chemoff's 
inequality, the vector of proportions W" 1 and say are 
close. 

3) The vector of proportions W" 1 obtained by coupling 
is the vector of proportions at time X(w) in a queueing 
supermarket model without departures. The model consists of 
to queues with capacity C where customers arrive according 
to a Poisson process with rate m. At each arrival, a subset of 
d queues is chosen and the customer joins the shortest one, 
ties being solved at random. Let W m (t) be the vector of the 
proportions of to queues with 0,1, ... ,C customers at time t. 
It is more convenient to deal with the tail proportions defined 
as 

i>k 

Given W m (0) = s(w), we have that 

W™ = W m {X(w)). 

By the convergence of the Markov process (U m (t)) to the 
fluid limit (see Vvedenskaya et al. [13]), it holds that W™ 
converges in distribution to v(X(w)) where v is associated 
to u the fluid limit of (U m (t)), the unique solution of the 
differential system 

du k 
dt 

with initial condition u(0) corresponding to v(0) = s(w). 
Moreover using the continuity of a solution of a differential 
equation with respect to the initial condition, for each e, t > 0, 

< k < C, 



P( sup \U^(X(w))-u k (X(w))\>e) -» 



L k-l 



4(i<k< c), Uo 



(7) 



which straightforwardly leads to 



sup || W?-G{w) \\>e) -» 



where 

G(w) = v(X(w)). 
It ends the proof of the lemma. 



The argument to obtain Proposition [T] from Lemma [T] is 
standard and detailed in [7, Proposition 1]. It is omitted here. 



D. Fixed point of the dynamical system 
The function 



w 



G(w) 



being continuous on the convex compact set V^', by 
Brouwer's theorem, it has a fixed point. 

It remains to prove the uniqueness of the fixed point. Recall 
that, for d = 1, the proof is based on the interpretation of the 
fixed point equation 

G(w) = w 
as the invarriant measure equation 

wP = w 

of some ergodic Markov chain with transition P. This Markov 
chain is the queue length at the service time completions of 
a M/G/l/C queue with deterministic service times equal to 
1 with arrival rate X(w). The proof of the uniqueness of the 
solution of w = fi\( w ) is then based on the coupling argument 
that, if A < A' then /i> is stochastically dominated by fi\> (see 
[7] for details). 

Let us try to extend the argument for the case d > 2. For 
that, let us consider the following system. Balls are thrown into 
to urns with capacity C with a Poisson arrival process with 
rate Ato. Each ball joins the least loaded urn among a subset 
of d urns, chosen at random. The ties are resolved uniformly. 
At each unit time, one ball is removed from each non-empty 
urn. It can be proved as in Proposition [T] that the vector of the 
proportions of urns with k balls just before time n converges, 
when to is large, to a dynamical system 

w n+ i = H(w„) 

where, for v defined as previously with initial condition v(Q) = 
s(w), 

H(w) = v(X). 

But the argument fails for d > 2. Indeed, the equation w = 
H{w) can not be interpreted as the invariant measure equa- 
tion of some ergodic Markov chain (i n )„ e N on {0, . . . , C} 
because the differential system |7| is not linear for d > 1. If 
it was then there should exist P\ such that 

H(w) = v(X) = v(0)P x = s(w)P x = wP 

where P = QP\ because s(w) can easily be written wQ 
where Q is a transition matrix. 

Thus another way to prove it should be found and, at this 
point, the uniqueness of the fixed point is conjectured. 
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E. Identification of the fixed point 

If the capacity is infinite, then the parameter X(w) is equal 
to r. In this case, it is simple to have the explicit expression 
of w\, which is a good approximation for the case C = 20. It 
is the purpose of this section. 

Assume that C = +00. By definition, 

where F(x) — J? jf^j defines a bijection from [0, 1[ to its 
image. Using that A(u>) = r for C = +00, it can be rewritten 

r = F(r) — F(r — u>i), 



or 



wx = t- F~ x (F(r) - r). 



(8) 



Notice that for d £ N, F has an explicit expression. For d = 1, 
F(x) — — log(l — x) which leads (see [7]) to 

u>i = (l-r)(e r -l). 
Moreover, for d — 2, F(x) — argthx and thus 

r — thr 



Wl 



1 — rthr 

In Figure [T] wi is plotted for d= 1, 2. 



(9) 




Fig. 1. Limit proportion w\ of counters with value 1 for d = 1,2. 



F. Convergence of invariant measures 

A first result is obtained. It is proved in [7, Proposition 2] 
and recalled here omitting the proof. 

Proposition 2: Let, for m £ N, ir m be the stationary 
distribution of (W^™)„ g N- Define P as the transition on 
given by P(w, .) = S G(w) . Any limiting point ir of m ) m6N 
is a probability measure on which is invariant for P i.e. 
that satisfies G(ir) = n. 

As noticed in the introduction, the limiting point of ir m is 
not necessarily unique, because there is not a unique measure 
7r such that G(ir) = ir. Nevertheless G has a unique fixed 



point thus G(Sw) = S^- But imagine that G has cycles, i.e. 
that there exist n > 2 and W\, . . . , w n in such that 



G(iUj) = w l+1 (1 < i < »), G(ui n ) 



then tt = l/w^JLj is invariant under G. It gives two dif- 
ferent limiting points for 7r m . A way to prove the convergence 
is to exhibit a Lyapounov function for G (see [7, Theorem 1] 
for details). Such a Lyapounov function is exhibited in [7] for 
d = 1 and C = 2. It is not investigated here. 

G. General mice size distribution 

The aim of the subsection is to extend the previous results to 
a model with general size distribution. An approximated model 
is taken. Indeed, as mice size are short (with mean close to 
some units, in real traffic traces, close to 4), an approximated 
model is to consider that the packets of the mice are thrown 
without interleaving in the target counters. It means that the 
packets of the different mice arrive consecutively in the filter. 

The model chosen is thus an urn and ball model where 
balls are thrown by batches . The balls in a batch are thrown 
together in a unique urn, the least loaded urn among d chosen 
at random in the m urns. The ith batch is composed with Si 
balls, where the S^s are independent random variables with 
distribution denoted by p. 

Let also (W™) ne jq be the sequence of vectors giving the 
proportions of urns at 0, . . . , C just before the nth refreshment 
time in this model where balls are thrown by batches. The 
dynamic is the same: If, before a refreshment time, the state 
is w £ Vm\ it becomes s(w) and then a number t™(w;) 
defined by Q of successive batches are thrown in urns until 
\rm~\ urns are non empty. The model generalizes the previous 
one obtained for mice of size one (p(l) = 1). 

Note that equation Q is extended in this case by 

r-th(r/E(S)) 



wi = r 



1 - rth(r/E(S))' 
Let G be defined on V by 

G(w) = v(X(w)) 



(10) 



(11) 



where the tail function (u(i)) corresponding to (v(t)) is the 
unique solution of the differential equation 



du k 
~dt 



= X>(«*-j - <) (1 < k < C )' «o = 1. (12) 



Everything in Section[Tl]remains valid. The supermarket model 
obtained by coupling is a model with batch arrivals without 
departures. Its mean-field limit is obtained as previously and 
leads to the differential equation ( [12) . Propositions [T] and 
[2] hold. The description of the unique fixed point can be 
extended. 

III. Experiments 

In this section, the proposed algorithm is tested against an 
ADSL traffic trace from France Telecom IP backbone network. 
This traffic trace has been captured on a Gigabit Ethernet 
link in October 2003 between 9:00 pm and 10:00 pm. This 
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period corresponding to a peak activity by ADSL customers, 
its duration is 1 hour and contains more than 10 millions of 
TCP flows. 

In our experiments, the filter consists of m = 2 20 counters 
associated to two independent hashing functions (d = 2). 
Elephants are here defined as flows with at least 20 packets 
(if = 20). 
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Original version of the algorithm 
Supermarket model 
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Fig. 2. Impact of the Supermarket model on the estimation of number of 
elephants , r = 50%, France Telecom trace 

The relative error on the estimated number of elephants is 
plotted in Figure [2] Two different versions of the algorithm are 
considered: The original algorithm developed in [5], [6] and 
the proposed algorithm using the supermarket model. We re- 
call, that these two algorithms use the min-rule (incrementing 
only the smallest counter), but in a different way: In case of 
equality, only one counter is incremented at random with the 
supermarket model whereas, the two counters are incremented 
in the original version of the algorithm. Results show that both 
methods give a good estimation of number of elephants, for 
the whole duration of the trace. 
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riving packets) for the whole traffic trace. It can be noticed that 
the stationary phase is reached at the A'th refreshment time. 
So the transition phase seems to be rather short, according to 
experiments. The stationary inter-refreshment time using the 
algorithm based on the supermarket model is higher than the 
one obtained with the original version of the algorithm. This 
can be explained by the fact that with the supermarket model 
every arriving packet increments exactly one counter, whereas 
in the original version, if the two selected counters are equal, 
they are both incremented by one. In particular, when they are 
both null, they will be both impacted. As a consequence, the 
proportion of non null counters grows faster and the filling up 
threshold r is reached more quickly. 

Figure [3] gives an explanation to the behavior of the algo- 
rithms plotted in Figure [2] In the original algorithm, the inter- 
refreshment time is lower thus more elephants are missed. The 
error is thus negative. In the supermarket version, the error is 
positive due to false positives. 
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4. Comparison between rm and the stationary inter-refreshment time 
France Telecom trace 



In Figure |4] the impact of r on the stationary inter- 
refreshment time t™ is investigated. More precisely r™/rm 
is plotted for various values of r. According to experiments, 
is very close to rm. In fact the refreshment can be seen 
as removing rm from the sum S of all counters (decreasing 
by one all non null counters which are exactly rm as the 
refreshment is performed as soon as the filling up threshold 
r is reached). As we are in the stationary phase, we have 
convergence of Wi, the proportion of counters at i, for i E 
{0, . . . , C}. Therefore the sum of all counters converges. So 
just before the next refreshment, rm packets must be inserted 
into the filter, to let S have its former value. Packets belonging 
to elephants which have been detected are not taken into 
account. Those packets are very numerous and they are not 
inserted into the filter to avoid polluting it. 



Fig. 3. Duration of the transition and the stationary phase, r 
Telecom trace 



50%, France 



Figure [3] presents the inter-refreshment time (duration be- 
tween two successive refreshments in terms of number of ar- 



IV. Discussion 

The performance of the algorithm clearly depends on the 
filling up r. To have a good estimation of the number of 
elephants , r must be around 50%. When r has higher values, 
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elephants number will be largely overestimated due to false 
positives. The key quantity is the stationary proportion of 
counters at i when m gets large. An explicit expression for w 
is not available even if a numerical value could be computed. 
Nevertheless, less ambitiously, one can maybe simply found 
the critical value of r for which utc gets non negligible. At 
least, the impact of r on w is shown here by simulation. 
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1 23456789 10 

i 

Fig. 5. Impact of r on the limit stationary distribution w, simulation using 
only mice with uniform size distribution of mean 4, m = 2000 

Figure [5] (w is written for w) is not based on a real traffic 
trace but on simulation. The objective here is to evaluate 
the limit stationary distribution w if we consider a traffic 
composed only of mice. The mice mean size is taken equal 
to four to be close to the real traffic (This value is deduced 
from the real traffic trace). Under these conditions, we obtain 
a decreasing limit stationary distribution of w, when r equals 
50%. For a filling up threshold of 90%, counters are very likely 
to be higher. We can notice that the main part of counters 
values is around six and there are many counters at C. This 
explains the fact that with a filling up threshold around 50%, 
the algorithm performs better. Equation (JTOj with C = oo 
gives a very good approximation of the values of Wi in this 
case. Indeed, the analytical expression gives wi — 0.10 for 
r = 0.5 and w\ = 0.05 for r — 0.9. These values are very 
close to the values obtained on Figure [5] by simulation. 

V. Conclusion 

We analyze in this paper a new algorithm catching on- 
line elephants in the Internet. This algorithm is based on 
Bloom filters with a refreshment mechanism that depends 
on the current traffic intensity. It also uses a conservative 
way to update counters, called the min-rule. This latter is 
exactly to increment the lowest counter among a set of d 
chosen at random as in the supermarket model which provides 
a much lower tail distribution for the counter values. For 
a model involving just mice, limit theorems investigate the 
existence of a deterministic limit for the empirical distribution 
of counters values, when the filter size gets large. This limit 
can be exploited to adjust the parameters for the algorithm 
to perform well. The accuracy of the algorithm and some 



theoretical results are tested against a traffic trace from France 
Telecom and by simulations. 
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